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PREFACE 


The Examination Research Cell (ERC) of the Association of Indian Universities 
has been from time to time investigating into various fundamental aspects of university 
examinations Internal assessment. Grading, Test & item analysis, Practical Examina¬ 
tions to name only c few. Results of these research projects have been reported in the 
form of Monographs some of which have been revised to include experiences of teachers, 
colleges/universities during implementation, in subsequent editions. 


At the same time, certain research studies have been conducted and it is felt 
that these have to be reported in the form of Research Abstracts and three such ab¬ 
stracts are now getting ready. 


In this first Research Abstract, the following in depth studies have been 
included. 

1. On the optimum number of choices in multiple choice items 

2. The number of options as a variable in multiple choice items. 

3. A comparative study of facility and discrimination values of multiple 
choice item tests with varying options. 

4. Facility value and discrimination index of supply type questions. 

It is hoped that teachers, papersetters, examiners and others will find the re¬ 
search studies and the results and conclusions will be very helpful and useful and that 
they will be guided to better evaluation of their students' performance. 


Constructive suggestions for advancing some of these studies, will be most 
welcome. 


New Delhi. 

22nd July, 1978. 


V. Natarajan 
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ON THE OPTIMUM NUMBER OF CHOICES IN MULTIPLE CHOICE ITEMS 


INTRODUCTION 

As an important step towards reforming their examinations, many universities 
in our country have restructured their patterns of examination papers to consist of 
three sections or parts viz. Part A objective type items; Part B short answer ques¬ 
tions and Part C long answer-essay/problem solving questions. A few of these uni¬ 
versities with the help of their more professional and progressive teachers have con¬ 
stituted initial banks of items including objective type and particularly Multiple choice 
and Multiple facet items. The Association of Indian Universities*'* through its Research 
Cell has built up Question/item banks in various first degree level subjects involving 
nearly 3000 teachers from various universities. At least 30 per cent of the total is 
made up of Multiple choice items. The quality of Multiple choice items depends to a 
large extent on the 'stem* and its 'distractors'. While ‘stem’ is the question or the, 
incomplete statement at the top of the item, 'distractors* are those 'options' other than 
the key' or the correct answer. Usually 4 options are given one of wnich is the 'key' 
and the other three options are called 'distractors'. The author** * elsewhere has 
reiterated that 'the distractors* must perform a 'dual* function - the more able students 
looking to them, must dismiss them as 'distractors' and the less able students must be 
'attracted' towards them. This then is the real function of 'distractors* and this will 
enable the *key to have a positive and reasonable discrimination (ability to differentiate 
between higher ability students and lower ability students). It follows therefore that the 
Multiple choice items must 'elicit' a behaviour pattern characterised by - a) the key 
chosen by a greater number of higher ability group than that of the lower ability group. 
The 'key' will then have a positive discrimination, b) the 'distractors' must be chosen 
by a greater number of lower ability group than higher ability group. The 'distractors* 
must have a negative discrimination. ElsewhereTan analysis for 'effectiveness' of 
distractors has been presented to highlight these points. 


Some Results 


A 20 item Multiple choice test (in Educational Measurement) was given to 76 
students and a count of choices (four options in every item) made in respect of all 
items considering a 27% upper and 27% lower and the middle 46% (Middle group). 


♦Natarajan V, Restructuring university examinations. University News, Vol.4(Il); 
Nov. '76:p6-8. 

^Question Bank Book Series 01 Mathematics, 02 Physics, 03 Chemistry, 04 Zoology, 
05 Botany, 06 History, 07 Geography, 08 Psychology, 09 Economics, 10 Commerce, 
New Delhi, AIU, 1977. 

** Natarajan V, Towards Better Questions (Item Writers Cookbook), New Delhi. AIU. 

1* Natarajan, V. Monograph on Test & Item Analysis for Universities, New Delhi 
AIU, 1977. 





For a few items, the results are given below: 


Item No. 1 


A 

B 

* 

C 

D 
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18 

0 
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15 

5 
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0 
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0 

33 

43 
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Item No. 5 
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Item No. 7 
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Item No. 18 
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Here for some items, of the three distractors, one is ineffective making the Multiple 
choices into one of 3 choices. For some other items, of the three distractors, two are 
ineffective making the Multiple choice item of 4 choices into one of 'constant alternative' 
of two choices. 

Yet, for a few, all the three distractors are 'Ineffective'. The 'ineffectiveness' of 
distractors is understood in terms of the distractors being not chosen at all by any or 
may be they behave in the same way as key i. e„ more and more of higher ability stu¬ 
dents chose them compared to lower ability students. In respect of the key, every 
Multiple choice item must have the right kind of F. V. (or difficulty) and DI. It is 
equally important that all the distractors must have 'negative discrimination'. There¬ 
fore it is seen that Multiple choice items have 3, 4 or 5 (sometimes) choices per item. 

We have just seen that a 4 option item in effect'may be a 2 cho'ce or 3 choice item. The 
question therefore is "WHAT IS THE OPTIMUM NUMBER OF OPTIONS FOR A MULTIPLE 
CHOICE ITEM". This question is sought to be answered here from purely theoretical con¬ 
siderations first and from practical experimentation later 

APPROACHES 


A few approaches are available at this point in time to be able to look at this 
problem from a 'theoretical' angle. Each 'approach' makes the assumption that the total 



number of alternatives is fixed, e. g. 40 number 3 option Multiple choice or 30 number 
4 option Multiple choice or 24 number 5 option Multiple choice thus keeping total options 
as '120'. This will make sense if total testing time for a set ol N item is proportional to 
the number A choices per item. It seems likely that many or most item types do not 
satisfy the condition but doubtless some item types will be found for which the condition 
can be shown to hold approximately. This means that those multiple choice items that 
'requires 1 the student to read all 'options' before choosing (especially items testing 
higher order intellectual abilities) may demand this kind of assumption. The real rela¬ 
tion of N to A for fixed testing time should be determined experimentally for any given 
item type. When N is not proportional to A, the theoretical approaches given here may 
be modified in obvious ways to determine the optimal value of A for each item type. 

Many have investigated the optimal number of alternatives for maximum test relia¬ 
bility. Their empirical evidence is somewhat contradictory. Ruch and Stoddard (1927) 
and Ruch and Charles (1928) conclude that because more of such items can be adminis¬ 
tered in a given length of time, two or three choice items give as good or better results 
than do four and five choice items. William and Ebel (1957, p. 64) report that "for tests 

of equal working time.three choice vocabulary test items gave a test of equal 

reliability a two choice items a test of higher reliability, in comparison with standard 
four choice items. However, neither of them differences was significant at the 10% 
level of confidence". One of the ways to eliminate choices is to drop those shown by 
item analysis to be "least discriminating". This will be desirable practical procedure 
that must yield better results than simply eliminating distractors at random. 


APPROACH OR METHOD I 


Let N be the number of items; A choices per item. It is defined that the optional 
number of choices is the value of A that maximizes 'the discrimination function' A^. 
The main reason to choose A^ is because it (A N ) gives the total number of possible 
distinct response patterns on N-A choice items 

When NA = K is fixed (in the example discussed earlier it is 120) 

N 

A is maximized by A = e = 2. 718 
N 

For integer values A, A is maximized by A = 

.-. when NA = K, three choices per item is optimal 


An experimental investigation here is suggested 

1. A tec t will be administered on atleast 100 students with 40 items 3 
choice giving 40 minutes duration 

2. The same test will be given now 30 items with 4 choices each for the 
same duration. 

These will be marked and test and item analysis performed. Result will be 
compared and reported. 

^Toops (1921), Ruch & Stoddard (1925), Ruch, Degraff, and Gordon (1926, 
pp 54-58) Ruch and Charles (1928). 




APPROACH OR METHOD II 


This approach is based on Griev (1975) investigation K-R formula 21 for reliability 
coefficient is considered and the coefficient (approximation) is maximized. 



N 1 

Here X = (N +~~ ) Here N is the maximum possible score 

N/A = chance score 


X 



1 

2 



difference between maximum possible score and the expected chance 
score 





Substituting for X and Sd in I, we get 





N 2 A 2 - N 2 


1 

N 

L 4A 2 1 

N-l j 

! ~ N 

— 36 

N 2 - 2N 2 N 2 

T" + 1? J 


N 2 (A 2 - 1) 


N 

— 

4A 2 


N- 1 

1 - “ 

L 

N 3 (A-l) 2 



N (A-1)' 

36 A 2 


N f _ 9 (A 2 - 1) 

N ' X N(A-l) 2 


N r, 9(A +1) 

N-l ^ " N(A-l) 

This formula is useful only for large N 


21 


N f 9 (A+l) 1 

N-l j_ N (A-l) / 


The given table shows the nature of as 

N~> 10,20,30,40, 50,60, 70, 

80,90, 100,120,150,200, 

500. 


A —2, 3, 4, 5 



A 

2 

3 

4 

5 

N 





10 

-1.88 

-0.88 

-0.55 

-0. 388 

20 

-0.3684 

0.1052 

0.2631 

0.3421 

30 

0.1034 

0.4137 

0. 5172 

0. 5689 

40 

0.333 

0. 5641 

0.641 

0.6794 

50 

0.4693 

0.653 

0. 7142 

0.7448 

60 

0.5593 

0. 7118 

0. 762 7 

0. 7881 

70 

0.6231 

0. 7536 

0. 7971 

0.8188 

80 

0.6708 

0. 7848 

0. 8227 

0.8417 

90 

0.7078 

0.8089 

0. 8426 

0. 8585 

100 

0. 7373 

0. 8282 

0. a 58 5 

0. -' 37 

120 

0.7815 

0.8571 

0.8823 

CL 8549 

150 

0.8255 

0. 8859 

0.9060 

0.9161 

200 

0.8693 

0.9145 

0.9296 

OL 9371 

500 

0.9478 

0.9659 

0.9719 

0. 9749 


A 40 item 3 option per item test seems to have a better value for reliability than 
a 30 item 4 option per item test. 

APPROACH OR METHOD HI 


This approach uses the knowledge or random guessing assumption and the relia- 
bilities for hypothetical tests 6f equivalent items. This assumption is not ordinarily 
satisfied in practice but it is unlikely to lead to measurable conclusions. The inter- 
correlation between two equivalent items under the knowledge or random guessing model 
is given by 

r 1 =- - - 

1 + 1/A- 1 <p) 

where r* = product moment correlation between n-cholce items when n = A. Here p and 
r denote the difficulty and product-moment inter-correlation of n-choice items when 





N = (Lord, 1974) . By the Shearman-Brown formula, the reliability of number right scores 
on a test composed of N equivalent A choice items is found to be 


V 


N 


1 + (N-l)r 


(N-l) r + 1 + l/A-l(p) 


Since 


N = 


r'tt = 


Kr +(l-r>A+ ( A ) p 
A-l 


we wish to know whrt value of A, the number of choices will maximize the test relia¬ 
bility r'tt. The optimal value of r is the value that minimizes the denominator of 
equation above. The derivative of the denominator w. r. to A is 1-r - 1 p. Setting 
this equal to Zero, and solving for A, the optimal value is (A-l)^ 


A = 1 + - - 

/ (1 - r)p 

It is easily verified that this value of A provides a maximum rather than a minimum 
for r'tt 



p=0. 20 

p=0. 50 

p=0 A 

= 0.10 

3.36 

2.49 

2.18 

= 0.20 

3.50 

2.59 

2.25 

= 0. 30 

3.67 

2.69 

2.34 

= 0.40 

3.92 

2.84 

2.46 

= 0. 50 

4.18 

3. 00 

2.58 


Optimal values are independent of test length, For p = 0. 5, this approach in similar 
to the previous one. Table below shows some typical values of test reliability, for the 
case where p = 0. 50 


A = 2 


A = 3 A = 4 A = 5 


K = 150; r = 0.30r'tt 0.893 


0.898 0.892 0.882 


It is seen that 3 options MC item - test yields better reliability. 
APPROACH OR METHOD IV 


A different perspective will appear when the item characteristic curve (icc) model 



is applied to this problem. The characteristic curve of an item gives the probability of 
a correct answer to the item as a function of examinees ability. The function is usually 
assumed to be a normal O give or (it makes little difference) a logistic function of ability 
the range of the function being modified to allow for examinees who get the correct an¬ 
swer by guessing. The icc is specified by three parameters that characterize the item 


The location parameter 

bi, 

the 

The scale parameter 

ai, 

the 

The lower asymptote 

ci, 

the 


difficulty of item i, 
discriminating power of item i, 
Pseudo chance score level of item i. 


Lord demonstrated that for various values of c^ - G. 20, 0. 250, 0. 333 and 0. 50 keeping 
ai and bi constant, the test for which ci = 0. 33 had superior values of reliability. How¬ 
ever the item discriminating power depends on the number of choices/items. This of 
course has not been taken into account in the study. 

It is also true that the effect of decreasing the number of choices per item 
while lengthening the test proportionately is to increase the efficiency of the test for 
higher ability students and to decrease its efficiency for low-level students. Suitable 
experiments can be devised to check the usefulness of this approach. It is felt of 
course that three choice items will be better. 


It is possible for us to look at other aspects of the problem of 3, 4 and 5 choices 
per item. Multiple choice items with 5 options is common in American practice. There 
is reduced probability of guessing (the probability is really 1/5 = 0. 2 for guessing the 
correct answer). The same time it must be said, in terms of item writing, there will 
be additional effort required to write 5 options. Multiple choice items with 4 options 
is universal. The probability of guessing the right answer is 1/4 = 0.25. Multiple 
choice items with 3 options will have increased probability of guessing. It has been 

well demonstrated that formula scoring (correction for guessing ^ w \ does not 

' n — lr 


alter the rank order and the rank order correlation between obtained scores and 
scores corrected for guessing is very high. Item writers very often find it difficult 
to write more distractors which are all plausible. A 3 option item is easier than 4 option 
item certainly , much easier than a 5 option item. 



THE NUMBER OF OPTIONS AS A VARIABLE IN MULTIPLE CHOICE ITEMS 


The multiple choice type of test item has gained considerable popularity among 
constructors of standardized tests. It is also now widely used by classroom teachers. 
Multiple choice item is the most flexible and versatile of all selection type items. It 
may be used to measure instructional objectives at all levels of the cognitive domain: 
Knowledge, Comprehension, Application, Analysis, Synthesis and Evaluation. It may 
be considered as a king of the select ion-type items. Since a fairly large number of 
items can be answered during a normal examination period, it is possible to Include 
items covering several instructional objectives in many content areas. If a table of 
specifications is carefully prepared and used in the construction of the test, this is 
likely to have a relatively small content sampling error. When contrasted with supply 
type items, scoring errors are of small concern, in multiple-choice tests. They may 
be scored rapidly, accurately, and objectively. Further more, the scoring of these 
examinations is not influenced by the previous performance and subjective judgement. 


The multiple-choice item mainly consists of: 


a) Stem: 

b) Options: 

c) Key: 


which is at the top of the item either in the form of a direct 
question or an incomplete statement. 

usually 3, 4 or 5 of these are given as (a), (b), (c), (d) & 
(e), One below the other. 

the correct answer among the options. 


d) Distractors: Options other than the key. 


Hence the multiple-choice test items will have 5,4 or 3 options including the 
correct answer or option. The options other than the correct answer or key are caU- 
ed 'distr actor s'. Their function is to distract those students who are uncertain of the 
answer. Thus the primary purpose of distractors is to attract the examinee lacking 
the requisite knowledge in plausible choices which he may choose in preference to 
the keyed response. Thus the quality of multiple-choice item depends on the quality 
of stem and distractors. Since it is not easy to devise appropriate and discrimina¬ 
ting alternates, the wise teachers make a mental note of misconceptions and errors 
as they occur in class exercises and recitations. Many of these classroom mis¬ 
conceptions and errors eventually can be incorporated in test items. Nevertheless, 
sometimes it may be necessary to use some completion type items simply because 
acceptable distractors can not be devised. The student's common errors and mis¬ 
conceptions shown in their responses to completion items can be incorporated in 
future tests as distractors on multiple choice items. 


Number of distractors for optimum test reliability: 

Theoretically, we do not know how the allotment of distractors to choice points 
affects test reliability. The question has become very much controversial. Costin 
(1970) has reviewed empirical support for the use of three-alternative items and pre¬ 
sents some new results favouring them. In addition he finds the use of three-alternative 




to rive more reliable tests than four-alternative items. More recently, 

Ramos and Stern (1973) found no significant difference in certain regression 
systems between four and five - alternative tests; however, the five alternative tests 
were more reliable. These results extend and further support the theoretical advan¬ 
tages of three alternative test items, by showing that their use also maximize the expec¬ 
ted reliability of a test. But this is true only if the number of test items is increased 
to compensate for the smaller number of alternatives per item. 

Reliability of the tests : 

Three multiple choice tests of "General Knowledge" were given to the same set 
of fifty nine students with varying number of choices within the fifty questions. The 
tests are divided into test No. 1, No. 2, and No. 3 with five options, four options and 
three options respectively. However the content of the tests were not changed. The 
reliability for all the three tests is calculated by different methods for comparison 
and summarized as below: 

1. iSfrlit-Half reliability 



1 .8156 .8226 

2 .8000 .8010 

3 .8140 .7641 


lulon Formula of reliability 



3 


.8060 


.7589 




















4. 


Flanagans Formula of reliability 


Test No. 



1 

.8096 

■ 

2 

.7933 

. 7926 

3 

.8060 

. 7589 

Reliability by TV 

loiser Short Cut Method 

Test No. 1 

Test No. 2 

Test No. 3 

.6887 

.6667 

.6863 


6. Reliability by KR-20 Formula 


Test No. 1 

Test No. 2 

Test No. 3 

.8263 

.8353 

.8338 


7. Reliabilit y by KR-20 (Modified) form based on 27% IIAG 
and 27% LAG 


— 

Test No. 1 

Test No. 2 

Test No. 3 

.9082 

.9031 

.9064 

Reliability by KR-21 Formula 

Test No. 1 

Test No. 2 

Test No. 3 

. 7411 

.7749 

. 7647 

Reliability by Analysis of Variance 





Test No. 1 

Test No. 2 

Test No. 3 

.8144 

.8252 

.8202 
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Tucker Modified K-R Formula 


Test No. 1 

Test No. 2 

Test No. 3 

.8542 

.8520 

.8459 


It is therefore clear from the results given earlier that the reliability of all the 
three tests is not very much different from each other. 

. Optimum number of options for a multiple choice item. 

The primary function of the distractor in a multiple choice test item is to distract 
those students who are uncertain of the answer. The ineffectiveness of distractors is 
understood in terms of the distractors being not chosen at all by any or may be they 
behave in the same way as key i. e. more and more higher ability students choose them 
compared to lower ability students. In respect of the key, every multiple choice item 
must have the right kind of facility value and discrimination indices. It is equally im¬ 
portant that all the distractors must have ’negative discrimination*. Therefore, it is 
seen that multiple choice items have 3,4 or 5 (sometimes) choices per items. The 
question therefore is "what is the optimum number of options for a multiple choice 
item"? This question is sought to be answered here from purely theoretical consi¬ 
derations first and from practical experimentation later. 

Approaches 

A few approaches are available at this point in time to be able to look at this 
problem from a theoretical angle. Each approach makes the assumption that the total 
number of alternative is fixed e.g. 40 number 3 option multiple choice or 30 number 
4 option multiple choice or 24 number 5 option multiple choice. Thus keeping total 
options as 120. This will make sense of total testing time for a set of N items is 
proportional to the number of A choices per item. It seems likely that many or most 
item types do not satisfy the condition but doubtless some item types will be found 
for which the condition can be shown to hold approximately. 

Let N be the number of items; A choices per item. It is defined that the option¬ 
al number of choices is the value of A that maximizes ’the discrimination function' 

A n . The main reason to choose A N is because it (A N ) gives the total number of possi¬ 
ble distinct response patterns on N-A choice items. 

When NA = K is fixed (in the example discussed earlier it is 120) 

A^ is maximized by A - e = 2. 718 

N 

For integer value of A, A is maximized by A =3 

when NA = K, three choices per item is optional. 

An experimental investigation here can be suggested as follows: 

i. A test will be administered on atleast 60 students with 50 items 




5 choices giving 50 minutes duration 

ii. The same test will be given to the same students with 4 choices 
for the same duration. 

iii. The same test will be given to the same students with 3 choices' 
for the same duration. 

The purpose of this paper is to examine the effectiveness of three, four or 
five options in multiple choice tests and the relationship that exists among these tests. 


Tools : A general knowledge test has been prepared for this study. The test 

containing fifty questions has been divided into three parts: 

1. test no. 1 with five options 

2. test no. 2 with four options 

3. test no. 3 with three options 

The content of all the three tests was similar, only the number of alternatives has 
been changed to see the effectiveness of the same. 


Sample : The Y.M. C. A. Engineering Institute, Faridabad was selected to carry 

out this study. All the three tests were given to the same set of fifty nine (59) students, 
studying in the 1st year class of the institute. The first test with five options was ad¬ 
ministered in the first day morning. The second test with four options was administered 
at the evening time of the same day and the third one with three options was adminis¬ 
tered in the next morning. However, fifty minutes were given to complete all the three 
separately. The examinees were asked to respond on a separate answer sheet. 

Hypothesis : The following null hypothesis was formulated for the study. (H 0 ) There 

will be no significant difference among all uiese three tests. 


Analysis of the results : 

Formulated hypothesis can be tested with the teachnique cf Analysis of Variance. 
The total scores of 59 students of these three tests can be identified with two-way-classi¬ 
fication with one observation per cell. The two levels of classification being the exami¬ 
nees at 59 levels and tested at 3 levels. The null hypothesis to be tested is: 

Ho These three tests are not significantly different against the alternatives used in 
the tests. 

let aij ; = 1 (1) 59, j = 1 (1) 3, represents the total scores of the ith student on the 
jth test. We note the following calculations. 



jih test. We note the following calculations. 


SI 6 ™ 2 2 2 2 

- aij = 49 09, ^ aij) = (1488) + (1624) + (1797) 

lj j=l i=l 


= 8080729 

59 3 

5T CS aij) = 414125 

1=1 j=l 


The sum of squares due to different levels are calculated as: 

■ ...2 


a) Correction factor (c. f) = 


(g 


59 x 3 


(4909) 


177 


= 136148.48 

b) Total sum of Square (T. S. S) = ^3 ^aij — C. F. 

i j 

= 143259 - 13614a 48 = 7110. 52 


c) 


Sum of squares due to Examinees (S. S. Ex.) 


59 


F-i <fw a «) 2 - 


3 

414125 


C. F. 


- 136148.48 = 1893.18 


d) 


Sum of squares due to tests (S. S. T.) 


1 3 .59 ...2 _ _ 

_JL *23 aij) -C.F. 

59 j=l i=l 


(1488) 2 + (1624) 2 + (1797) 2 - C, F. 


59 



= -MM29_ _ 136148 48 = 2182.64 
59 - 


c) Sum of squares due to Error (S. S. E.) 

= T. S. S. - S. S. Ex. + S. S. T. 

= 7110. 52 - 1893.18 + 2182.64 = 3034. 70 


The given results may be summarized in the following ANOVA. 


ANOVA (Analysis of Variance Table No. 1) 


Source of 
variation 

Degrees of 
freedom 

Sum of Squares 

Mean Square Variance 
error ratio 

Due of Examinees 

58 

S. S. Ex = 1893.18 

M. S. Ex. 




S. S. Ex 

58 

= 32.64 

Due to Tests 

2 

S. S. T. =2182.64 

M.S.T. V.R T «^ 




=S. S T. 

~~2 - 

=1091.32 

Due to Error 

58x2 = 116 

S. S. E. =3034. 70 

M S F S. S. E. 

M - &E - ' 116 




=26.16 

Total 

176 

T. S. S. = 7110. 52 



For testing the null hypothesis (1) the variance ratio of tests which is distributed 
as central F (with d. f. 2,116) comes out to be 41. 71. 

Hence the calculated value F (2,116) = 41. 71. From the F - distribution table, the 
tubulated value fpr F(2 *116) = 3. 075 at 5% level of significance. As the calculated value 
is greater than the tabulated value therefore, the hypothesis (Ho) has to be rejected at 
5% level of significance. 

Conclusion 1 : Thus significant difference exists among the tests. 

In the light of the result one may be interested in knowing which of the tests differ 
significantly taken two at a time. Hence we can formulate the following hypotheses 


Ho^ Test with five options (No. 1) and test with four options (No. 2) are not 



significantly different, against the alternative of significant difference 
between test 1 and test 2. 

This hypothesis may be tested on the basis of the following results. The obtained 
results may be summarized in the following table. 


Analysis of variance Table No. 2 


Source of 

Degrees of 

Sum of 

Mean square 

Variance 

variation 

freedom 

Squares 

error 

ratio 

Due to 
Examinees 

58 

S. S. Ex. =3696.60 






=63.73 

VR - MST 

T MSE 

Due to Tests 

1 

S. S. T. = 156.75 

M. S. T. =-~p L 




=156.75 

= 9.69 

Due to Error 

58x1 = 58 

S.S. E. = 937.65 






= 16.16 


Total 

117 

T.S. S. =4791. 00 




For testing Hoi, the variance ratio of tests comes out to be 9,69 (which is 
distributed as central F with degrees of freedom 1,58). 

Hence the calculated value F (l, 58) = 9. 69 

From the F - distribution table, the tabulated value for F (1,58) = 4. 05 at 5% level 
of significance. Since the calculated value is greater than the tabulated value. There¬ 
fore, the hypothesis Ho 1 has to be rejected at 5% level of significance. Hence 
the significant difference exists between the tests with five options with that of 
four options. 

H 02 Test with five options (No. 1) and test with three options 

* (No. 3) are not significantly different against the alternative 

that these are significantly different. 


This hypothesis may be tested on the basis of the following results obtained 
from the same procedure and may be summarized in the given table. 







Analysis of Variance Table No. 3 


Source of Degrees of Sum of Squares Mean square Variance 

variation Freedom error ratio 


Due to Exami¬ 
nee" 


Due to Tests 


Due to Error 


58 S. S. Ex. = 3561.45 M. S. Ex. =SSEx 

58 

=61.40 

1 S. S. T. =809.17 M.S.T. =SST VRt =MST 

1 MSE 

=809.17 =7.77 

58x 1=58 S. S.E. = 5040.33 M.S.E.= SSE 

58 


=104.14 


Total 117 T.S.S. =10410.35 


For testing H 02 the variance ratio of tests comes out to be 7. 77 (which is 
distributed as central F with degrees of freedom 1, 58). 

Hence the calculated value F(i t 58 ) = 7 . 77 

r 

From the F - distribution table, the tabulated value for ( 1 , 58)= 4. 05 at 
5% level of signifiance. Since the calculated value is greater than the tabulated 
value. Therefore, the hypothesis H 02 has to be rejected at 5% level of signifi¬ 
cance. Hence the significant different exists between the tests with five options 
with that of three options. 

Hog Test with four options (No. 2) and test with three options 

-- (No. 3) are not significantly different against the alternative 

that these are significantly different. 


This hypothesis may be tested on the basis of the following results obtained 
from the same procedure and may be summarized in the following table. 


__ Analysis of variance Table No. 4 

Source of Degrees of Sum of Squares Mean Square Variance 

variation freedom error ratio 


Due to Exami- 58 S. S. Ex. - 957 M. S. Ex. - S. S. Ex . 

nees 58 

=16. 50 





Due to Tests 1 S. S. T. =253.64 M. S.T,= S, S. T . VRm =MST 

1 MSE 

=253.64 =4.0006 

Due to error 58x1=58 S. S. E. =3677.36 M. S. E. =S. S. E . 

58 


=63.40 


Total 117 T. S. S. =4888. 00 


For testing H 03 , the variance ratio of tests comes out to be 4 . 0006 (which is 
distributed as central F with degrees of freedom (1, 58). 

Hence the calculated value F(l, 58 ) = 4 . 0006 

F 

From the F - distribution table, the tabulated value for ( 1 , 58) = 4. 05 at 5% level 
of significance. Since the tabulated value is greater than the calculated value. There¬ 
fore, the hypothesis H 03 can not be rejected at 5% level of significance. Hence the 
significant difference does not exist between the tests with the four options with that 
of three options. 

Conclusion: It is therefore clear from the results given earlier that the 

hypothesis Hq,H 1 and H 2 are rejected, whereas Hpgis not rejected. Hence it is 
obvious that the significant difference exists among all the three tests. Similarly 
significant differences exists in between test with five options with that of four 
options and, in between test with five options with that of three options. However, 
significant difference does not exist in between test with four options with that of 
three options. 
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A COMPA RATIVE STUDY OF FACILITY AND DISCRIMINATION VALUES OF 
MULTIPLE CHOICE ITEM TESTS WITH VARYING 
OPTIONS 


Examination and evaluation have always been considered to be an essential and 
Integral part of educational system and have in recent years received more and more 
attention. An objection that is commonly levelled against the examinations is that they 
often fail to serve the purposes for which they are employed. There are so many ex¬ 
perienced teachers and educationists who sincerely believe that some examinations 
foster undesirable habits among students and serve to hinder the fulfilment of im¬ 
portant educational objectives. The fact that there is an examination to be faced at the 
end of a course will undoubtedly affect the activities that teachers and students joint¬ 
ly undertake, but there seems to be no reason why these encouraged activities should 
not be of an educationally desirable kind. The old pedagogic slogan that there is no 
impression without expression contains a truth that we all acknowledge. Moreover 
examinations compel students not only to acquire knowledge and skills but to repro¬ 
duce their knowledge and to apply their skills. An important step towards reforming 
the examinations, many Universities in our country have restructured^ their patterns 
of examination papers to consist of three sections viz: 

a. Objective type items 

b. Short answer questions. 

c. Long answer - essay problem solving questions. 

The word ’item' is associated with the objective type while the term "question" 
is associated with the short answer and long answer essay type. The word "objective" 
refers principally to the method iof marking of the test rather than to the content. This 
means that an objective test is one which is so constructed that the score of a parti¬ 
cular candidate on the test is independent of the marker. Principally questions/ items 
fall under two broad categories viz. selection type and supply type. The former type 
of questions prove to be more adequate if the objective of testing in comprehension, 
identifying relationships, application of facts and principles, analysis of elements 
or evaluation. The second type of questions test the abilities like that of integration, 
summarization, expression, reorganization or evaluation. Supply-type of questions 
can be further categorized into five sub-classes: 

1. Simple question (answer may be a word, number, phrase or at the most 
a sentence) 

2. Completion question (again with a word, number, phrase or sentence). 

3. Short-paragraph answer, short answer question (scope limited and 
direction very clear) 


^Natarajan V. Restructuring University Examination. Universities News, 
Vol.Mll), Nov. 76. p. 6-8. 





4. Long-answer requiring anywhere between 200 to 2000 words. 

5. Problem-solving type. 

Similarily selection-type questions/items can also be categorized into five 
sub-classes: 

1. Corstant-alternatives (True/False;Yes/No.j Agre/Disagree etc.) 

2. Multiple-choice (One among 3,4 or 5 alternatives) 

3. Multiple-Facet. 

4. Matching (simple and compound) 

5. Rearrangement. 

Multiple-choice items are the most versatile of all selection type items. A multiple- 
choice item is one in which a number of alternative responses (may be three, four or five) 
are provided from which the student is required to select the correct answer. The multi¬ 
ple-choice item mainly consists of stem, options, key and distractors. The stem Is the 
first part of the item in which the task required of the student is clearly stated. The op¬ 
tions are all the responses offered as possible answers to the item. The key is the cor¬ 
rect answer and the distractors are the Incorrect options. 

Facility Value and Discrimination Index: The facility value of an item indicates how easy 
or difficult it proved to be and is determined by calculating the percentage of students who 
answered it correctly. Thus if half the students involved, answer an item correctly and 
the other half fail to do so, it is said to have a facility of 50%. What facility values are to 
be regarded as acceptable will depend, of course, on the kind of test one is trying to con¬ 
struct and on the purposes that it is intended to serve. It is clear however that in most 
instances we are seeking to achieve some degree of discrimination between the able and 
the less able or between those who have and those who have not profited from our instruc¬ 
tion. It is unlikely therefore that we shall be content to include too many items with 
extreme facility values. Items that none of the students can answer spread alarm and 
despondency and do not serve any useful purpose. Those that all the students can answer 
may be justifiable as morale - boosters, particularly at the beginning of a test but fall 
to disclose differences in levels of ability and attainment. The efficiency of an item 
means the extent to which it pulls its weight. If the purpose of a test is to discriminate 
between able and less able students an efficient item in one which demonstrably serves 
this purpose. To ensure that it is doing so one needs not only to discover the percentage 
of students who answer it correctly but also to find out what kind of students they are. 

For example, suppose we find that a particular item was answered satisfactorily by 
60% of the students. This means that, in terms of facility value, It has almost cer¬ 
tainly proved to be acceptable. In case we examine the performance of the students 
on the test as a whole and divide them into three groups on the basis of their total 
scores. If we now found that the 60% who had answered the item correctly were equally 
represented in there three groups - 20% higher, 20% lower, and 20% middle, it would 
be clear that the item was not performing a useful function. It is therefore inefficient, 
in the sense in which we are using the term. 



It Is usual to express the difficulty of a test item by means of an index of difficulty. 
When we calculate the index of difficulty for an item using our item analysis data* we use 
only the information from the two groups, the high and the low and do not use the infor¬ 
mation from the students of the middle. So to obtain the index of difficulty we combine 
the data from the two extreme groups. If for instance we had 40 students in our both 
groups, we would determine the number of correct answers in each group, add them 
and divide by 80. We would multiply the resulting decimal fraction by 100 to convert it to 
a percentage and call the product our index of difficulty. By the discrimination power of a 
test item we mean its ability to differentiate between students of high achievement and 
students of low achievement. For classroom test, we define high achievement and low 
achievement in terms of the total scores on the test itself. All test items can be classi¬ 
fied as either a) positively discriminating, b) negatively discriminating or c) non¬ 
discriminating. A positively discriminating item is one in which the percentage of cor¬ 
rect answers is higher for the lower group than for the upper group. A non-discrimina¬ 
ting item Is one in which the percentage of correct answer is the same for both groups. 
During the construction of a test item everybody wishes to eliminate negatively dis¬ 
criminating and non-discriminating items. Because the items with positively discrimi¬ 
nating value make a positive contribution to the overall functioning of the test. Each of 
such items adds something to whatever it is that the test is measuring. 

Purpose of the study: The main purpose of this is to make a comparative study of 
facility value and discrimination index of multiple-choice item in tests with varying 
options (test with three, four and five options). The difficulty of the item is determined 
by the effectiveness of the foils and the degree of homogenity exhibited by the alternates. 
The greater the similarity among the alternates, the more difficult the item. 

Approaches: A different perspective will appear when the item characteristic curve 
(i. c. c.) is applied to the problem. The characteristic curve of an item gives the proba¬ 
bility of a correct answer to the item as a function of examinee ability. The function is 
usually answered to be a normal 0 give or (it makes little difference) a ligistic function 
of ability, the range of the function being modified to allow for examinee who get the 
correct answer by guessing.The Lc.c. is specified by three parameters that characterize 
the item. 

i) The location parameter bi, the difficulty of item i. 
il) The scale parameter a[, the discriminating power of item i, 

iii) The lower asymptote ci, the pseudo chance score level of 

item i. 

Lofd demonstrated that for various values of c^ = . 20, . 25, . 33 and . 50 keeping aj and 
b[ constant, the test for which c^ = . 33 had superior value of reliability. However the item 
discriminating power depends on the number of choices. It is also true that the effect of 
decreasing the number of choices per item while lengthening the test proportionately is to 
increase the efficiency of the test for higher ability students and to decrease its efficiency 
for low-level students. It is possible for us to look at other aspects of the problem of 3,4 
and 5 choices per item. Multiple-choice items with 5 options are common in American 
Practice. There is reduced probability of g lessing (the probability is really 1/5 = 0.2 for 
guessing the correct answer) at the same time, it must be said, in terms of item writing, 
there will be additional effort required to write 5 options. Multiple-choice items with 4 
options are universal. The probability of guessing the correct answer is 1/4 = 0.25 Multi¬ 
ple choice items with 3 options will have increased probability of guessing (1/3 = 0.33). 



The following hypotheses were formulated for this study: 


Ho-^ There will be no significant correlation between the facility value 
of test items with 5 option with that of 4 options. 

Ho 2 There will be no significant correlation between the facility value of 
test items with 5 options with that of 3 options. 

Ho 3 There will be no significant correlation between the facility value 
of test items with 4 options with that of 3 options. 

H 04 There will be no significant correlation between the discrimination 
value of test items with 5 options with that of 4 options. 

H 05 There will be no significant correlation between the discrimination 
value of test items with 5 options with that of 3 options. 

Hog There will be no significant correlation between the discrimination 
value of test items with 4 options with that of 3 options. 

Tools: A General Knowledge test has been prepared for this study. The 

test containing fifty items has been divided into three parts with 
varying options: 

i) Test with five options (No. 1) 

ii) Test with four options (No. 2) 

iii) Test with three options (No. 3) 

However the content of all the three test was not changed, only the number of 
alternatives have been changed in these items. 

Sample The students of Y.M. C. A. Engineering. Institute, Faridabad were pur- 

posively selected to carry out this study. All the three tests were given to the same 
set of 59 (fifty-nine) students, studying in the first year class of the Institute. The first 
test was administered on the first day morning. The second test was administered on 
the same day evening and the third test was administered on the next day morning. Fifty 
minutes were given to complete all the three tests separately. The examinees were 
asked to respond on a separate answer sheet. 

Intercorrelation among facility values and discrimination values: 

The intercorrelations among facility values and discrimination values were found 
out for the given tests to verify the formulated hypotheses and tabulated as 




below: 


Intercorrelation among facility value 



Test No. 

Test No. 

Test No. 


1 

2 

3 

Test No. 1 

1 

.9059 

.8695 

Test No. 2 

.9059 

1 

.9107 

Test No. 3 

.8695 

.9107 

1 


Intercorrelations among discrimination value 



Test No. 

Test No. 

Test No. 


1 

2 

3 

Test No. 1 

1 

.5852 

.4966 

Test No. 2 

.5852 

1 

.3479 

Test No. 3 

.4966 

.3479 

1 


Analysis of the results 

To test the formulated hypotheses (Ho 1 ,Ho 2 » 1103 , 1104 , Hog and Hog) for facility 
value and discrimination value both, we can apply the following test: 

To test the hypothesis in general Ho : f = 0 i. e. the population correlation co¬ 
efficient is zero. 

Against the alternative hypothesis it is not zero. So we can test it by the following 
formula. 


t = r - / (n - 2 ) 

7 1 - r 2 


This statistic follows t (n - 2) df. In this way all the Formulated hypotheses can 
be verified on the same line as follows: 

Ho^ Corr. I, II - Facility Value : 



The value of r = . 9059 for I & II hence putting the values in the following for¬ 


mula. 


t = - - - / n - 2 

/ 1 - r 2 


.9059 


/ 1 - 8206 


/ 48 


t 


9059 x 6.9282 
.4235 


6,2762 

.4235 


14.8198 


The tabulated value of t with 48 dg. at 5% level of significance is (t 48; . 05) 

= 2 . 0102 . 

As the calculated value is greater than the tabulated value, therefore, the 
hypothesis has to be rejected as 5% level of significance. Hence the significant 
exists between the facility value of test with five options with that of four options. 

Ho 2 Corr. I, III Facility value 


The value of r = . 8695 for I and III. 

t /“n— - ^ 

/ 1 - r 2 /. 2440 


/48 


.8695 x 6.9282 
.4939 


6 . C24 
.4939 


12.1968 


Hence the calculated value of t 48; . 05 = 12.1968 which is greater than the tabu¬ 
lated value (t48» . 05 = 2 . 0102) therefore the hypothesis H 02 has to be rejected at 5% 
level of significance. 

Ho 3 Corr. II, HI - Facility Value : 

The value of r = . 9107 for II and HI. 


t =—- = /n - 2 

/I - r 2 ' 

.9107 x6.9282 
.4131 


t 


= 15.2735 



Hence the calculated value of 48,* . 05 = 15.2735 which is greater than the 
tabulated value (t 4 §; . 05 = 2. 0102) therefore the hypothesis H 03 has to be rejected 
at 5% level of significance. 

H04. Corr. I, II -Discrimination Value 

Similarly the hypotheses formulated for discrimination values car also be 
verified as follows: 

The value of r = . 5852 for I and II. 


t - - r ~ - /n - 2 


t = ' x 6 . 9282 

/ 1 - .3424 


.5852 x 6.9282 
.8109 


= 4.9S97 


Hence the calculated value of t 4 g; . 05 = 4.9997 which is greater than the tabu¬ 
lated value, therefore the hypothesis H 04 has to be rejected. 

Hog Corr. I, HI - Discrimination Value 

The value of r = . 4966 for ] and III 


t = -- /n - 2 

71 - r 2 


t 


.4966 

1 - (. 4966) 2 


x / 48 


.4966 

.8679 


x 6.9282 


= 3.9641 


Hence the calculated value of t 4 g,* . 05 = 3.9641 which is greater than the tabu¬ 
lated value, therefore the hypothesis H 05 has to be rejected. 


Hog. Corr. II, HI - Discrimination Value . 

The value of r = . 3479 for II and III 
t = —£- 


TiTrZ' 



t 


3479 


= /48 


/ 1 - (. 3479)^ 


. 3479 
.9375 


x 6.9282 = 2.5709 


Hence the calculated value of t. g ; . 05 = 2. 5709 which is greater than the tabulated 
value, therefore the hypothesis Hog has to be rejected. 


Conclusion: It is therefore clear that the ca’culated values of t with 48 d. f at 5% level of 
significance for all the formulated hypotheses are greater than the tabulated value. Hence 
all the formulated hypotheses have to be rejected. This will amount to the fact that there 
is a significant degree of correlation between tests of 5,4 and 3 options in respect of both 
F. V. and D. I. of individual items. It is therefore recommended on the basis of FV/DI of 
individual items, it does not matter at all whether are have 3 options or 4 options or 5 op¬ 
tions. Only in the matter of guessing, the probability of guessing the right answer is 
higher w. r. t. to 3 options namely 1/3 compared to that of 4 and 5 which are respectively 
1/4 and 1/5. Wherever correction for guessing is being applied, 5 option test item may 
be better than that of 4 and 3. Correction for guessing (formula scoring like Right 


Right 


Wrong _ 

(Number of option - 1) 


assumes that all w'rong answers us students are all due to guessing; wrong answer may 
be due to ignorance. Moreover correction for guessing keeps the same rank order as 
uncorrected raw scores have put them. There is in fact no need for correcting scores 
for guessing. In selection type items, students do guessing; but in supply type questions 
students do bluffing, guessing is much better than bluffing since we solve many problems 
in our lives, by intelligent guessing. 


On the basis of FV/DI, a 3 option test item will serve as well as 4 or 5 option test 
items would. This is particularly welcome as the author’s experience in various Ques¬ 
tion Banking workshops with University/CoUege teachers is that teachers are comfor¬ 
table with three options while the 4th one if compelled will result in dummies like ’none 
of the above’ or unacceptable options like 'all of these'. 



FACILITY VALUE AND DISCRIMINATION INDEX OF SUPPLY TYPE QUESTIONS 


The methoc to find out the facility value and discrimination index of supply type 
questions is essentially a generalization of those used in case of objective type items but 
two factors namely the subjectivity of marking and choice among questions, complicate 
their use and interpretation. In the case of a compulsory essay question, the F. V. may 
be defined as the everage (mean percentage), mark obtained on the question divided by 
the maximum marks for the questions. 


Sum of marks by all candidates _ 

Sum of maximum marks obtainable on that question 


This is just a generalization and this F. V is difficult to interpret as compared to a 
similar one for an objective type item. A low F. V. for a compulsory essay question 
may indicate that it was severely marked while being inherently of average difficulty. 
With essay type questions (choice or no choice papers) these two alternative inter¬ 
pretations which are not of course mutually exclusive (a question may be both in¬ 
herently rather difficult and severely marked) can always be offered and it is upto 
the examiners/teachers to review all the evidence to decide the relative effect of 
the two factors. One of the ways in which it can be taken is to give a new way of 
difining F. V. as the mean percentage mark which a homogenous group of average 
ability candidates (M,j, = 50%) would be expected to obtain on the question. 

i. e. F. V. of a question = (50 + - M,_) 

W i 

Where Mg = mean percentage mark on the question by those attempting 
^ it. 

M,j, = mean ability index i. e. percentage total marks obtained by 
all those who attempted the question. 

This is the Morrison index. There is another index suggested by Willmott & 
Nuttal. 


F. V. = (M + Mq - M t ) Where M - mean % of the group on the total 

examination. 

While Morrison F.V. is a'Sample Free' technique, Willmott Nuttal F.V. is not. It is 
also possible to take up only 27% Higher and 27% Lower group to find F. V. = (50 + 

Mg- M,J and these are also reported in the illustrative example. One other additional 
thmg brought out in the illustrative example is that FVs based on 27% higher and 27% 
Lower groups are added and averaged, they compare very well with FVs over the 
entire population. 

It was also suggested that M T = mean ability index may be interpreted taking 
every student*s total marks on the paper and subtract from it the marks on that 
question. This is assumed to be a better indicator of ability of the candidate. It is 



however left to the examination unit/ examiner/teacher to choose the method con¬ 
sidered to serve a definite purpose. In the section dealing with simplified methods, 
a method suggested by Dr. Harper is also included for calculating F. Vs. 

Discrimination Index : 

Discrimination index Is defined as an index indicating the ability of the item to 
discriminate (positively) between the higher ability students and lower ability students. 
Discrimination is usually measured by the correlation between the score on the item and 
the score on the total test. In theory, values of D. I. may range from -1. 00 to +1. 00. 

A value greater than + 0.30 among a sample of candidates numbering 150 or more 
generally indicates a satisfactory degree of discrimination. Value between 0 and 0.30 
indicate that these items need improvement while those with -ve values must be dis¬ 
carded. A very common step to find D. I. is to divide the total sample into two groups 
on the basis of the criterion. The obvious question is whether the two groups which may 
be upper/lower halves, or quarters or 27% - 27% or 10% or 10% or other proportions 
of equal numbers, behave differently with respect to the item. The simplest index 
from the source, is the difference (Pu -Py) where P u & P L are proportions of exami¬ 
nees answering the item right in lower ana upper groups respectively. An easily 
obtained derivative of this difference is (Z|j - Z]J where Zjj and Z^ are the nor mal 
curve deviates corresponding to P|j and P L . A nomogram is prepared by Iawshe (1971) 
this method is called D method and index D = Zjj - Z L . 

Johnson puts the upper - lower difference Py - P L in a form more convenient 
for computing by using the formula. 



ULI = Upper /lower index (D. I.) 

Ry - R l = numbers giving right answer in upper and lower group 

respectively. 

f = number of examinees in each group. 


Johnson recommends using 27% in each group in which case f = 0.27 N. He 
provides a standard error formula for the ULI which reads. 

<r~UlI = —j— j R d +Ri 

Where the difference P u - Pj is used as the index; an ordinary critical ratio test 
can be applied to determine the significance of the difference in proportions. A chi 
Square test can also be applied to the frequencies R u & Rj, the results of which would 
tell the same story as the critical ratio. Guilford has shown that when we know the 



proportions who pass the item in equal upper and lower criterion groups, the formula 
for chi Square reduces to: 

r 2 _ N (P u - Pi ) 2 

a r> 


Where P may be taken as the arithmatic mean of P u & Pi and q = 1 - p. 

There are a number of different ways of calculating a D. 1. Four coefficients 
of correlation are commonly used to indicate the correlation of an item with a cri¬ 
terion. They are the biserial r, Point - biserial r, tetrachoric r and the phi coefficient. 

If we are interested in the correlation between the variable that the item measures 
and the continuous criterion measure and if we may assume that the thing measured by 
the item is continuously and normally distributed in the populations, the biserial r is the 
coefficient we want. If the criterian variable is also normally distributed in the population, 
it can be dichotomized and a tetrachoric r may be computed. If we are interested in how 
well we can predict the criterion from the item or how much it can contribute to a total 
score with its own score limited to 0 to 1 the point biserial r is the coefficient to compute. 
The test theory that regards a total score as the summation of item scores assumes this 
tyoe of correlation. The Phi coefficient may be applied when the total score distributions 
arbitrarily dichotomized at some cutting score because the test will be used to discrimi¬ 
nate at that level. 


i) The Pi rerial r in item analysis : 


'Fne best fore ula to use for the biserial r in the item analysis application 
is: 

M - ML 

r = -—:- x (p/y) 

c~ t 

Where M D = mean criterion sco^e of those passing item 
(total score on the test) 


Mt = mean criterion score of all examinees. 

cr t = s. d. of all total scores 

p = proportion passing item 

y = ordinate in unit normal distribution corresponding to 

’p’. 

Where complete power conditions do not prevail, it would be best to use only statistics 
based on those who attempted the item. This would mean that Mrp and t will vary for 
some items. 




To illustrate its use with an example let us take item No. 1 in our 20 items 
test on 76 students. 


P 

p/y 

M p 
M t 
cr t 


43/76 

1.438 

608 

43 

12.86 

2.79 


= 0. 566 

(Taken fro m the table) 
= 14.14 


.‘.Yb 


14.14 - 12.86 
2. 79 


p/y = 


1.28 
2. 79 


x 1.438 = 0.657 


Alternatively 

The standard error for b to use in testing for significant departure from a 
correlation of zero can be estimated by formula. 


<n * = 7W =* / Pq 

y 


i 

7W 


x 1.259 


0.145 


2^ 


The Point Biserial f r' 

The formula for the point biserial r adopted to item analysis is: 


r pbi 


M - M. 

_J2_L 

cr- t 


x / P/q 


Here 


M 


M. 


o~t 


r pbi 


/P/q = 

1.28 
2.79 


14.14 

12.86 

2.79 

1.142 


x 1.142 = 0 . 525 



r pbi 


can be estimated from the biserial r by the relationship 


r pbi 



0.657 


0. 3944 
0.4964 


0. 525 


Again it is possible to look into the graph and read straight. 


3. The tetrachoric r 


The use of tetrachoric r in item analysis would be prohibitive without com¬ 
puting aids. An a b a c (Moiser and Me Quality) enables tetrachoric r to be read 
of when Pu and PI are known. The total score distribution must be dichotomised 
at the median. N should be large (as large as 400) due to large sampling error 
S Error is computed by 


er-rt 


1.253 

T N 


x 


/Pcl 

y 


4. The Phi Coefficient 


1.253 
/ 76 


1.259 = 0.180 


In the item analysis situation, where upper and lower group are equal in number, 
Guilford has shown that the Formula for Phi Coefficient is simplified to: 

P - P, 0.90 - 0.25 

u 1 _ _ 

2 7 Pq 2/0.4964 


0.65 

2x / 0.4964 


0.65 

0.9928 


= 0.65 


D. I. of essay type questions : 

As in the case of an objective item, the D. I. of an essay question (choice or no 
choice type exam.) is simply the correlation between the mark on the question and the 
mark on the whole paper (or section) with the difference that Pearson’s product moment 
correlation coefficient (rather than the specialised biserial or pt biserial) can be used. 
This has no effect on the way in which the D. I. is interpreted. However, an objective 
type test usually consists of a relatively large number of items so that a single item 
contributes only a little to the final total, while an essay examination usually requires 
only half-a-dozen or so questions to be answered. Each essay question contributes an 
appreciable amount to the total mark and the correlation between the mark on the 
question and the mark on the paper is spuriously high. For this reason, a satisfactory 
D.I. for an essay question is rather higher than that requirement for an objective item 
and only values > 0. 50 indicate that a question is showing adequate discrimination. The 



most likely causes of inadequate discrimination in an essay question are a failure on the 
part of the markers to use the whole of the available mark and the disagreement on the 
part of the markers about the qualities that characterize a good answer. 

In the illustrative example involving QN I compulsory (maximum marks 15) and 
any 5 out of 7 to be answered by 117 students, D. I. is calculated on the basis of 

1. Mark on the question (whose D. I. is to be calculated) with the 
total mark on the section A. 

2. Mark on the question (whose D. I. is to be calculated) with the 

total mark on the section A - the mark on the question. 

This is done for the entire population. A simplified formula has been suggested 
by Dr. Edwin Harper and this is also made use of in the illustrative example. Two illus¬ 
trative examples are to follow. 

1. An objective type test of 20 items on 76 students 

2. A choice type examination with 8 questions (Question I compulsory 
and any 5 out of 7 questions). 

Item/Question Analysis of Choice Type Examination Facility Value. 

The facility index for a written question (essay type) might be defined (as sugges¬ 
ted by Drake) as the sum total of the marks obtained for the question divided by the 
total available marks for the question and expressed as a percentage. This would be 
the mean percentage mark for the question (Mq). For this section A» Table below 
shows the mean percentage marks for these 8 questions. 


Section A 

QNI 

QN II 

QN III 

QN IV 

QN V 

QN VI 

QN VII 

QN VIII 

M.P. M or 

F.V. 

77.13 

71.5 

72.8 

51.9 

15.0 

57.3 

31.9% 

61.7% 


If the total entry of a representative group answered the question, the Facility index 
so defined will be useful to have. But what if the group happened to contain a dispro¬ 
portionate number of lower ability candidates? This would have the effect of depres¬ 
sing the index. Similarly a disproportionate number of higher ability candidates would 
elevate the index. The facility index so defined is therefore not a conservative one, 
but depends on the effect of choice. The Facility index would not depend so much on 
the performance for that question by the total entry, as on the performance by a 
possibly atypical group. This choice factor does not complicate the determination 
of the F. V. of an objective item. In discussing a possible facility index for a ques¬ 
tion, it is assumed that the question has been reliably marked. If this is not the 




case, a further factor is introduced which will be a function of the examiner, if the 
examiner is lenient the F. V. will be artifically raised and vice versa. This influence 
again is absent in the case of objective test. A n allowance may have to be made for 
the effect of choice as well for the effect of marking in the case of essay questions. 

A high Facility Index in the case of an objective type item means it is an easy item. 

A high facility index for an essay question could mean an easy question but could 
equally mean lenient marking. 

A matched question would be one for which an average ability student would 
score 50%. This means that the marking is neither lenient nor severe. If such a question 
is Ideally discriminating, we would expect that a student of ability I standard -deviation 
above average to obtain a mark for the question which is 1 standard deviation above 
50%. If the marks are standardised with mean = 50% and s.d. = 15% then these two 
students will obtain 65% and 35% respectively. This is merely stating in another way 
what the question is meant to achieve if it is functioning properly. We may extent this 
to homogeneous groups of students. If a homogeneous groups of average ability students 
attempted the question, we would expect a mean mark of 50%. If it is an easy question 
the mean mark may be 60% or more and for a difficult question the mean mark may be 
40 or less. We may use this as a basis for defining a Facility Value for the essay 
question. The Facility v alue may be defined as the mean percentage mark gained on 
tiie question by a homogeneous group of average ability candidates. In practice, we 
do not have homogeneous group of candidates. However the mean percentage mark 
gained on a question by a sample taken from the population of students would be the 
same as that for a homogeneous group of the same mean ability if the ouestion is ma- 
ched and discriminating properly. It might be useful to refer to the graph below. Here 
the horizontal axis represents on a percentage scale the mean ability of candidates 
attempting a question. In the absence of a better criterion this may be taken as the 
mean of the total percentage marks for the whole paper Mt> obtained by those candi¬ 
dates attempting this question. This assumes an ideal examiner and also that each 
question is matched i. e. behaving similarly to the one under consideration. The 
vertical axis represents the mean percentage mark obtained for the question Mq by 
those candidates attempting it. This constitutes the basis for the question - synoptic 
chart. 


Let us suppose that all qeustions are ideal, that they have all been answered 
by a representative entry from the population so that M T = 50 and that they have been 
marked by ideal examiners. In this case each question would have a Mq value of 50% 
and each value for the candidates attempting the question would also be 50%. Such 
an ideal case would result in each question having the same M-p Mq coordinates and 
they would all fall on the 50%, 50% origin point. The questions are matched and the 
discrimination perfect. This is not likely in practice but still this would serve as a 
starting point to understand the effect of changing each variable in turn. What is the 
effect of changing only the Mean ability of the entry? The questions and examiners 
remain ideal. If the entry is of lower ability, the value will be lower but so also 
will be the value of Mq. With ideal questions and examiners, a 40% Mean ability group 
sould obtain a mean <3 40% on each matched question. The effect of the different entry 
would be to locate all the questions at 40%, 40 rather than 50%. In general, the question 
will be located at some point on the line of unit slope passing through the origin of 50% - 
50%. It is important to note that this line represents the locus of performance by different 
ability groups on ideal questions marked by ideal examiners, Although the mean •"ark on 
the question has changed, this is entirely due to the change in ability of the entry. It is 



because of this that the mean percentage mark for a question cannot be used as a 
Facility Value; in that it is a function of group answering the question. 


What is the effect of choice? The effect is to yield different Mrp value for each 
question, and instead of all the questions being located at one particular point on the 
line of unit slope, they will now be located at different points on the Hne of unit slope. 
Thus in the ideal situation with choice, all thf questions previously located at 50% - 50% 
will be distributed along the line of unit slope passing through 50% - 50%. As the only 
variable is the mean ability of the group answering the question, the facility of thr.t 
question is unchanged: it remains an ideal question with a Facility Value of 50 , 0 . For 
this reason the line passing through 50% - 50% is called the Facility datum line. All 
questions falling on this line will have a Facility index of 50% 

Let us turn to the ideal situation and consider the effect of changing the questions. 
Suppose that M T = 50% entry obtains a mean of 60% on a question when marked by ideal 
examiners. In this case; the question will be located at the 50% - 60% point. The effect 
of choice is to distribute questions such as this one along a line of unit slope new pas¬ 
sing through 50% - 60%. Because Mp = 50% entry would obtain a mean percentage 
marks of 60%. Similarly a family of parallel lines of unit slope each representing a 
Facility Value given by the Mq value on the Mp = 50% vertical axis, can be thought 
of. 

Question indices 

Choice Index (Cl) 

The Choice Index is a measure of the popularity of the question. It is defined as 
the percentage of total population that attempted the question. A compulsory question 
attempted by all will have a Cl = 100% others will range from 0 to 100 in a choice 
type examination. 

Mean Ability Index (Mt) 

The Mean Ability is a measure of the ability of the group of students attempting 
the question; it is equal to the mean of their total marks. 

Facility value (F . V.) 

The Facility Value is usually a measure of easiness or difficulty of the question. 

It is defined as the mean percentage mark which a homogeneous group of average 
ability candidates (M T = 50%) would be expected to obtain on the question. 

Discrimination Index (D.I.) 

The Discrimination Index is a measure of how the question discriminates bet¬ 
ween candidates of different abilities and of how it is matched to other questions on 
the paper. It is defined by the correlation Coefficient (r) between the marks on the ques 
question for those candidates attemping it and their total marks on the paper minus 
their marks on this question i. e. the correlation between Xq and Xrp - Xq. 

Where Xq = mark on the question j for a par ticular 

Xp = total mark on the section ofn student, 
the paper * 



This assumes linearity between the two distributions. In practice, this is usually the 
case for most questions and for a poorly matched question where it is not, a suffi¬ 
ciently close estimate is obtained for practical purposes. 

Table below shows the Mean Ability Index, Choice Index and Mean percentage 
mark for the question, F. V. & D. I. calculated. 


Question Analysis Table 


Question 

Number 

I 

II 

III 

IV 

V 

VI 

VII 

VIII 

Choice 

100 

75 

87 

92 

3.4 

97 

36 

100 

Index 
(C. I.) in% 

(117) 

(88) 

(102) 

(108) 

(4) 

(113) 

(42) 

(117) 

Mean Ability 

63.13 

66 . 00 

63. 70 

74.30 

52.70 

63.40 

58.80 

63.13 

Index (M t ) 
in % 

(41. 03) 

(42.89) (41.38) 

(48.31) (34.25) (41.23) 

(38.21) 

(41. C3 ‘ 

65 

65 

65 

65 

65 

65 

65 

65 

Mean percentage 
mark 

for the QN(Mq) 

77. 13 

71. 5 

72.8 

51.9 

15. 0 

57.5 

31.9 

61.7 

Facility Value 

(F.V.) in % 

62.5 

54 

58.5 

26.5 

12 

44 

22 

48 

Discrimination 

Index (D. I.) 

0.482 

0.75 

0. 52 

0.50 

0.24 

0.50 

0.49 

0.46 


Question Synoptic Chart 


For each of the 8 questions, the mean percentage mark Mq and the Mean per¬ 
centage total mark Mj are found out. These two values for every question will enable 
the question to be plotted on the M^-Mq diagram. This is repeated for each question 
on the paper and a cluster of question locations is obtained. Because this cluster of 
question locations provides a synoptic picture of the examination, the graph is called the 
question synoptic chart. 






The Facility value (F. V.) for each question is not given by its Mq value. As has 
been mentioned, this would be misleading in that it fails to take account of the ability of 
the group attempting the question. If Mq = 31.9, this could arise from the question 
being on the difficult side but it could also arise from the fact that it was attempted by 
a somewhat lower ability group. The F. V. for a question is found by drawing a line 
of unit slope through the question plot and taking the value of Mq where this line inter¬ 
sects = 50% axis. In this way each question is in a sense standardized to give the 
expected Mq value for a homogeneous group of average ability i.e. M T = 50%. The 
effect of choice is then effectively eliminated. It may be noted that all questions on 
the Synoptic Chart which fall on the same line of the unit slope will have the same 
F.V. 

Discrimination Index 


The Discrimination Index of any question (as has been already discussed) can 
be found from either. 

1 ) the correlation between the marks for the question and the total 
marks on the paper or section which ever is applicable. 

2 ) the correlation between the marks for the question and the total marks 
on the section minus the marks for the question for those attempting 
the question. 



Table below gives the discrimination indices of the 8 question in Section A by 
both the methods. 


TABLE OF DISCRIMINATION INDICES 


Question 

Number 

I 

II 

III 

IV 

V 

VI 

VII 

VIII 

Remarks 

D.I. 

(Method I) 

0.65 

0.60 

0 . 82 

0.62 

0.97 

0. 72 

0.68 

0.54 

Correlation bet¬ 
ween QN mark 
and section mark 

D.I. 

(Method II) 

0.48 

0. 75 

0. 52 

0. 50 

0.24 

0. 50 

0.49 

0.46 

Correlation bet¬ 
ween QN marks 
and section mark 
-QN mark) 


However the D. I. Values by Method II viz. the correlation between the mark for the 
question and (the section mark - mark for the question) orly have been taken to give accu¬ 
rate values. 

Certain other things that can be said about F. V., M-p and D. I. are: 

1 . The Mean ability index M~, is the value of Mj for each question. The 
F. V. is found from the intersection of the line of unit slope drawn 
through the question plot with the M T = 50% axis. 

2. The difference between Mq & M T will not exceed 50% practically in 
all cases. 

Mq^ M t 50% 

3. The F. V. of a question may be calculated directly (without a need 
to draw Question Synoptic Chart) by the formula. 

F.V. = 50 + (Mq - M t ) percent 


QN. NO. 

I 

II 

III 

rv 

V 

VI 

VII 

VIII 

(Mq-M-p) 

+14.0 

+5.50 

+9.10 

-22.40 

-37.70 

-5.90 

-26.90 

-1.43 

F.V. 

64.00 

55.50 

59.10 

27.60 

12.30 

44.10 

23.10 

48.57 

F.V. 

(from QSC) 

62.50 

54.00 

58.50 

26.50 

12.00/ 

44. 00 

22 

48 


4. Those cases where the difference between Mq & M T should happen 
to exceed 50%, will by the definition of F. V. have facility value of 



100 % and 0% depending on the sign of the difference. 


Discussion 


Refer ; ng to the graph (Question Synoptic Chart of the 8 questions), it will be 
seen that QNIwithM T = 63.13 tended to be preferred by lower ability students. On 
the other hand, QN IV with = 74. 30 was preferred by higher ability students. 

These questions can be scrutinized why this is so. The mean total percentage mark 
for the entire group is 66% and all the questions (except IV) are clustered around. 
Except W, others indicate that they are attempted by lower ability or this may indi¬ 
cate severe marking. Similarly QN I with F. V. = 62. 5% may be easy or leniently 
marked. 

The acceptability or otherwise of questions may be based on similar considera¬ 
tions to those governing the choice of objective items after item analysis. Questions 
with F. V. of 40 to 60% could be considered acceptable if the discrimination is high 
enough. Questions with F. V. higher than 60% would be easy (or leniently marked) 
and those with F. Vs. 40% difficult (or severely marked). The marking need to be 
scrutinized before a final decision is made regarding the alternatives. The discrimi¬ 
nation index needs to be high although it is not expected that such a wide range for 
this index will be obtained as in objective analysis. The mean Ability Index (Mr) 
affords some idea of acceptability of the question by the students. If only questions 
with high mean ability indices are included in the paper, this could tend to inhibit 
the choice of the lower ability candidates and probably vice versa. 

The F. V. as given by this procedure is a conservative statistic as far as the 
ability of the candidates answering the particular question is concerned. This assumes 
1 to 1 correspondence between Mq & M'p values. 

It is possible to calculate F. Vs. of these 8 questions based on top 27% sub 
group and bottom 27% subgroup and compare the F. Vs. 


Top 27 % of 117 = 30 students 


QN. NO. 

I 

II 

III 

IV 

V 

VI 

VII 

VIII 

m t 

76.20 

76.32 

76.20 

76.31 

- 

76.20 

74.23 

76.2 

Mq 

87.55 

78.88 

86.07 

60.17 

- 

69.00 

52.50 

70.5 

F.V. 









50+Mq-M>p 

61.35 

52.56 

59.87 

33.86 

- 

42.8 

28.27 

44.30 



It is possible to calculate F. Vs. of these 8 questions based on top 27% sub 
group and bottom 27% sub group and compare the F. Vs. 


Bottom 27% of 117 = 30 students 


QN NO. 

I 

II 

IH 

IV 

V 

VI 

VII 

vm 

Mq* 

49.56 

52.47 

49.58 

49.20 

48. 71 

49.45 

48.31 

49.56 

Mq 

66.22 

57.14 

59.37 

42.50 

10 . 00 

47.14 

23.75 

54.16 

F.V. 

66.66 

54.67 

59. 79 

43.30 

11.29 

47.69 

25.44 

54.60 


The F. V. of an Item in an objective type test is not a conservative or a sample 
free statistic. If a group of student of different ability attempts them, a different F. V. 
will be obtained. This can lead to the rejection of items which might prove suitable 
for another group of students. The analogous quantity in Question synoptic chart to 
F. V. in objective item analysis is Mq, in that Mq is a function among other things 
of both the question & the ability of the grop choosing to answer the question. 


QN. NO. 

I 

II 

in 

IV 

V VI 

VII 

VIH 

Average 

F.V. 

64. 00 

53.61 

59.83 

38.58 

5.64 45.24 

26.85 

49.45 


FACILITY VALUES OF THE EIGHT QUESTIONS IN SECTION A BASED ON MORRISON 
INDEX (MODIFIED) 

F.V. = 50 + (M q -M t ) 

Where Mq = Mean percentage mark on the question as in the previous 

case, 

M^, = Mean ability index given by the mean percentage of (Section 

total - mark on that question) of those who attempted. 



(Note/. As in the case of discrimination, here also (Section totalmark on the 
question) is taken to give a better indication of ability*. 


QN. Number 

I 

II 

III 

IV 

V 

VI 

VII 

VIII 

Mq (percent) 

77.1 

71.5 

72.8 

51.9 

15 

57.6 

31; 9 

61.7 

M*p (percent) 

58.9 

64.9 

62.5 

65.3 

59.5 

64.3 

63.6 

63.3 

(Mq-M t ) 

18.2 

6.6 

10.3 

-13.4 

-44.5 

-6.7 

31.7 

-1.6 

F.V. = 

(50 +Mq-M t ) 

68.2 

56.6 

60.3 

36.6 

5.5 

43.3 

18.3 

48.4 



